Estimation of the percentile of Birnbaum-Saunders distribution and its application to PM2.5 in Northern Thailand

The Birnbaum-Saunders distribution plays a crucial role in statistical analysis, serving as a model for failure time distribution in engineering and the distribution of particulate matter 2.5 (PM2.5) in environmental sciences. When assessing the health risks linked to PM2.5, it is crucial to give significant weight to percentile values, particularly focusing on lower percentiles, as they offer a more precise depiction of exposure levels and potential health hazards for the population. Mean and variance metrics may not fully encapsulate the comprehensive spectrum of risks connected to PM2.5 exposure. Various approaches, including the generalized confidence interval (GCI) approach, the bootstrap approach, the Bayesian approach, and the highest posterior density (HPD) approach, were employed to establish confidence intervals for the percentile of the Birnbaum-Saunders distribution. To assess the performance of these intervals, Monte Carlo simulations were conducted, evaluating them based on coverage probability and average length. The results demonstrate that the GCI approach is a favorable choice for estimating percentile confidence intervals. In conclusion, this article presents the results of the simulation study and showcases the practical application of these findings in the field of environmental sciences.


INTRODUCTION
PM2.5, also known as particulate matter 2.5, refers to tiny particles or droplets suspended in the air with a size not exceeding 2.5 micrometers.These minute particles fall under the category of airborne pollutants and can be composed of a variety of materials, including dust, soil, soot, smoke, and liquid droplets (Thangjai & Niwitpong, 2020).Due to their diminutive size, PM2.5 particles can be readily inhaled into the respiratory system, penetrating deeply.The presence of PM2.5 in the atmosphere raises notable environmental and health concerns, given its potential adverse impact on human health.Long-term exposure to PM2.5 is associated with a spectrum of health problems, including respiratory issues, cardiovascular ailments, and even premature mortality.Short-term exposure to elevated PM2.5 levels can exacerbate conditions such as asthma and bronchitis.The monitoring of PM2.5 levels is a routine practice in assessing air quality, and numerous countries have established air quality standards and regulations to limit the concentration of these fine particles in the air to protect public health.Residents in regions with heightened levels of PM2.5 are frequently advised to take precautions, like staying indoors during periods of poor air quality and using air purifiers to reduce their exposure.Thailand has encountered air quality challenges related to PM2.5, especially during the dry season when activities like agricultural and forest burning contribute to increased levels of particulate matter in the air.Cities like Bangkok, Chiang Mai, Mae Hong Son, and Lampang provinces have experienced periods of subpar air quality due to PM2.5 pollution, prompting health advisories and government interventions to mitigate its impact.Thailand has initiated a variety of measures to combat air pollution, including the implementation of regulations, public awareness campaigns, and steps to reduce pollution sources, such as curbing open burning and enforcing vehicle emission standards.Much like many other governments globally, the Thai government has been actively working to enhance air quality and reduce the health risks associated with PM2.5 exposure.Many researchers have studied PM2.5 air pollution, including Broomandi et al. (2021) and Galán-Madruga et al. (2023).
The Birnbaum-Saunders distribution has become well-known in the fields of life testing and engineering.Its initial application was to model the time until failure, a consequence of the growth of a dominant crack under cyclic stress, where failure occurs once it surpasses a specified threshold (Birnbaum & Saunders, 1969a).Additionally, Bhattacharyya & Fries (1982) illustrated that the Birnbaum-Saunders distribution can function as an approximation of the inverse Gaussian distribution.Desmond (1986) also proposed that this distribution can be considered a balanced combination of the inverse Gaussian distribution and its reciprocal.In the context of statistical inference, Birnbaum & Saunders (1969b) introduced maximum likelihood estimation for the distribution's parameters.Ng, Kundu & Balakrishnan (2003) put forward a modified moment estimation method for these parameters, along with a technique to mitigate bias in both the maximum likelihood estimator and the modified moment estimator.Wu & Wong (2004) presented approximated confidence intervals for the Birnbaum-Saunders distribution using higher-order likelihood asymptotic procedures.Furthermore, Li & Xu (2016) carried out a comparative study involving the fiducial estimator, maximum likelihood estimator, and Bayesian estimator for the distribution's unknown parameters.Guo et al. (2017) developed approaches for interval estimation and hypothesis testing pertaining to the common mean of multiple Birnbaum-Saunders populations, utilizing a hybrid methodology that combines generalized inference and large sample theory.Jayalath (2021) applied a flexible Gibbs sampler to draw inferences about the two-parameter Birnbaum-Saunders distribution when dealing with right-censored data.Lastly, Puggard, Niwitpong & Niwitpong (2022) introduced novel techniques for estimating confidence intervals for both the variance and the difference in variances of Birnbaum-Saunders distributions, using PM2.5 data.Based on their work, we subsequently developed confidence intervals for the percentiles of Birnbaum-Saunders distributions, also leveraging PM2.5 data.
In the realm of statistical analysis, while mean and variance are frequently utilized measures, in practical scenarios, percentiles assume greater significance than mean or variance.Percentiles are a commonly used statistical concept.They help determine where an observation stands relative to a specified percentage of values below it, and they serve as indicators of both central tendency and variability.Percentiles are a statistical tool employed to elucidate the relative position of a specific data point within a dataset or a probability distribution.They reveal the percentage of data points that reside either below or above a designated value in the dataset.This methodology is often applied to gain insights into data distributions and to gauge where a particular data point stands in relation to others.The 25th percentile, also known as the first quartile, designates the threshold below which 25% of the data points are situated.It serves as the lower demarcation of the initial quarter of the dataset.The 50th percentile, commonly referred to as the median, is the pivotal point that effectively bisects the dataset into two equal halves, with precisely half of the data points positioned below it and half above it.There is a necessity to compare two distributions, and the researcher will need a specific parameter for the purpose of comparison.Often, researchers commonly select the mean as their primary reference point, presuming it to be the most dependable parameter for characterizing the population.Nonetheless, this preference is not universally applicable, and there are situations where the median emerges as a more reliable reference, especially when dealing with a strongly skewed distribution (Price & Bonett, 2002).Meanwhile, the 75th percentile, or the third quartile, signifies the boundary beneath which 75% of the data points are positioned.It denotes the upper threshold of the initial three-quarters of the dataset.Percentiles are invaluable for comprehending the distribution of data, identifying atypical data points, and facilitating comparative analyses.They are widely used in various fields, including education, where they aid in grading and ranking students, healthcare, for evaluating growth and health metrics, and finance, where they are instrumental in scrutinizing investment returns and associated risks.In situations where PM2.5 levels are high and pose a health threat, percentiles offer a more appropriate option than utilizing the mean and variance.Several researchers have delved into the realm of statistical inference pertaining to percentiles and quartiles.These scholars include Marshall & Walsh (1950), Harrell & Davis (1982), Kaigh & Lachenbruch (1982), Albers & Löhnberg (1984), Cox & Jaber (1985), Chang & Tang (1994), Padgett & Tomlinson (2003), Guo & Krishnamoorthy (2005), Huang & Johnson (2006), Navruz & Özdemir (2018), Hasan & Krishnamoorthy (2018), as well as Abdollahnezhad & Jafari (2018).
A multitude of scholars have delved into statistical investigations regarding the Birnbaum-Saunders distribution.Prominent contributors in this domain comprise Birnbaum & Saunders (1969b), Engelhardt, Bain & Wright (1981), Achcar (1993), Lu & Chang (1997), Ng, Kundu & Balakrishnan (2003), Wu & Wong (2004), Leiva et al. (2008), Wang (2012), Niu et al. (2014), Wang, Sun &Park (2016), andGuo et al. (2017).The objective of this article is to present confidence intervals for percentiles within the Birnbaum-Saunders distribution.Four distinct methods, namely the generalized confidence interval (GCI) approach, the bootstrap approach, the Bayesian approach, and the highest posterior density (HPD) approach, are employed to compute interval estimations for percentiles within the population.These methods make use of simulated data to establish the confidence intervals.To enhance their practical utility, a computer program has been created in the R programming language to compute coverage probability and average interval length.The article includes a numerical example to demonstrate the application of this program.

METHODS
Let X ¼ ðX 1 ; X 2 ; …; X n Þ be a random variable of size n drawn from a Birnbaum-Saunders distribution.The probability density function is defined by where a is the shape parameter and b is the scale parameter.
According to Chang & Tang (1994) and Padgett & Tomlinson (2003), the percentile of X is given by where z p ¼ È À1 ðpÞ is the standard normal p-th quantile.

Generalized confidence interval approach
Following Puggard, Niwitpong & Niwitpong (2022), the GPQ for b is given by where b 1 and b 2 are two solutions for b can be derived by solving The GPQ for a is given by where R b is defined in Eq. ( 3) and s 1 and s 2 are the observed values of S 1 and S 2 , respectively.
The GPQ for h is obtained by where z p ¼ È À1 ðpÞ is the standard normal p-th quantile, R b is defined in Eq. ( 3), and R a is defined in Eq. ( 4).
Therefore, the 100ð1 À cÞ% two-sided confidence interval for the percentile of Birnbaum-Saunders distribution using the GCI approach is given by where R h ðc=2Þ and R h ð1 À c=2Þ denote the 100ðc=2Þ-th and 100ð1 À c=2Þ-th percentiles of R h , respectively.Algorithm 1 was used to construct the GCI for the percentile of Birnbaum-Saunders distribution.

Bootstrap approach
Suppose that x ¼ ðx 1 ; x 2 ; …; x n Þ is a random sample drawn from Birnbaum-Saunders distribution with shape parameter a and scale parameter b.Let â and b be the maximum likelihood estimators of a and b, respectively.Let Step 1: Generate sample from the Birnbaum-Saunders distribution Step 2: Compute A, B, C, S 1 , S 2 Step 3: At the m step (a) Simulate T $ tðn À 1Þ and compute R b using Eq. ( 3) and compute R a using Eq. ( 4) (d) Compute R h using Eq. ( 5) Step 4: Repeat step 3, a total M times and obtain an array of R h 's Step 5: where k ¼ 1; 2; …; B. Therefore, the bootstrap estimator of the percentile is obtained as where z p ¼ È À1 ðpÞ is the standard normal p-th quantile, e a k is defined in Eq. ( 9), and e b k is defined in Eq. ( 10).
Algorithm 2 was used to construct the bootstrap confidence interval for the percentile of Birnbaum-Saunders distribution.

Bayesian approach
Bayesian approach offers a structured manner to integrate prior knowledge and revise beliefs as fresh data emerges.Assume that t follows an inverse gamma distribution with parameters a and b, represented as IGðtja; bÞ.Wang, Sun & Park (2016) utilized inverse gamma distributions as prior distributions of b and a 2 , denoted as IGðbja 1 ; b 1 Þ and IGða 2 ja 2 ; b 2 Þ, respectively.The marginal distribution of b is given by Algorithm 2 Step 1: Generate sample from the Birnbaum-Saunders distribution Step 2: At the b step Compute bðâ; aÞ using Eq. ( 7) and compute bð b; bÞ using Eq. ( 8) (c) Compute e a k using Eq. ( 9) and e b k using Eq. ( 10) (d) Compute ĥk using Eq. ( 11) Step 3: Repeat step 2, a total B times and obtain an array of ĥk 's Step 4: Compute L h:B ¼ ĥk ðc=2Þ and U h:B ¼ ĥk ð1 À c=2Þ Thangjai et al. ( 2024), PeerJ, DOI 10.7717/peerj.17019 The conditional posterior distribution of a 2 given b is defined by The samples in Eqs. ( 13) and ( 14) are derived using Markov Chain Monte Carlo methods.
Wang, Sun & Park (2016) employed the generalized ratio-of-uniforms method to generate posterior samples for b.Wakefield, Gelfand & Smith (1991) introduced the generalized ratio-of-uniforms method.It involves a pair of random variables ðu; vÞ, each of which follows a uniform distribution as defined by where r is a constant and pðÁjxÞ is defined by using Eq. ( 13).Therefore, b ¼ v u r has density u r is accepted if u ½pðbjxÞ 1=ðrþ1Þ ; otherwise, the process is reiterated.The posterior samples for a 2 are acquired using the LearnBayes package within the R software suite.Hence, the square root of a 2 represents the posterior samples of a.
Therefore, the posterior distribution of h is obtained as where z p ¼ È À1 ðpÞ is the standard normal p-th quantile.
Therefore, the 100ð1 À cÞ% two-sided credible interval for the percentile of Birnbaum-Saunders distribution using the Bayesian approach is given by where h Baye ðc=2Þ and h Baye ð1 À c=2Þ denote the 100ðc=2Þ-th and 100ð1 À c=2Þ-th percentiles of h Baye , respectively.Algorithm 3 was used to construct the Bayesian credible interval for the percentile of Birnbaum-Saunders distribution.

Highest posterior density approach
The HPD interval was constructed using the posterior distribution of h as specified in Eq. ( 19).This interval has the smallest length among all intervals that contain 100ð1 À cÞ% within the posterior probability.Every point within the interval has a greater probability than any point located outside of it, as explained by Box & Tiao (2011).
Therefore, the 100ð1 À cÞ% two-sided credible interval for the percentile of Birnbaum-Saunders distribution using the HPD approach is given by CI h:HPD ¼ ½L h:HPD ; U h:HPD ; (21) where L h:HPD and U h:HPD are determined using the hdi function within the HDInterval package of the R software suite.Algorithm 4 was used to construct the HPD interval for the percentile of the Birnbaum-Saunders distribution.

RESULTS
In this study, confidence intervals are proposed using the GCI approach, the bootstrap approach, the Bayesian approach, and the HPD approach.A simulation study was Algorithm 3 Step 1: Specify the values of a 1 , a 2 , b 1 , b 2 , and r, then compute aðrÞ using Eq. ( 16) and compute b þ ðrÞ using Eq. ( 18) Step 2: At the i step (a) Generate u from uniform distribution with parameters 0 and aðrÞ, denoted as Uð0; aðrÞÞ (b) Generate v from uniform distribution with parameters 0 and b þ ðrÞ, denoted as Uð0; b þ ðrÞÞ (c) Compute q ¼ v u r (d) If the value of q is accepted, the set b ðiÞ ¼ q if u ½pðbjxÞ 1=ðrþ1Þ ; otherwise, repeat step (a)-step (c) (e) Generate k from inverse gamma distribution with parameters n 2 þ a 2 and 1 2 and compute a ðiÞ ¼ ffiffi ffi k p Step 3: Compute the posterior distribution of h, denoted as h Baye , using Eq. ( 19) Step 4: Repeat step 2 and step 3, a total M times and obtain an array of h Baye 's Step 5: Compute L h:Baye ¼ h Baye ðc=2Þ and U h:Baye ¼ h Baye ð1 À c=2Þ Thangjai et al. ( 2024), PeerJ, DOI 10.7717/peerj.17019 conducted to evaluate the performance of these confidence intervals.A Monte Carlo simulation study was carried out to evaluate the effectiveness of the suggested confidence intervals for percentile of the Birnbaum-Saunders distribution, utilizing the R software.
The evaluation involved comparing the performance of these confidence intervals in terms of coverage probabilities and average lengths.The most desirable confidence interval is defined as one that achieves a coverage probability of the nominal confidence level 0.95 or higher, and the shortest average length.According to Puggard, Niwitpong & Niwitpong (2022), normal random variables were used to generate the Birnbaum-Saunders random variables.For percentiles, the shape parameter was set as a ¼ 0.10, 0.25, 0.50, 0.75, and 1.00, while the scale parameter was fixed at b ¼ 1.00.Moreover, for Bayesian credible interval and HPD interval, we considered r ¼ 2.00 and set the hyperparameter a 1 , a 2 , b 1 and b 2 to 10 À4 .We conducted 5,000 replications with 5,000 for the GCI using GPQ, B = 500 for the bootstrap confidence interval, and M = 1,000 for the Bayesian credible interval and HPD interval.
Algorithm 5 was used to compute the coverage probabilities and average lengths of the proposed confidence intervals for the percentile of Birnbaum-Saunders distribution.
The findings are based on the following simulation work.The performances of the proposed confidence intervals for the percentile of the Birnbaum-Saunders distribution were presented in Table 1 and displayed in Figs. 1 and 2. From Table 1, for n 50, the results showed that the coverage probabilities of all the proposed confidence intervals were lower than the nominal confidence level of 0.95.However, the GCI had coverage probabilities close to the nominal confidence level of 0.95.For n ¼ 100, the coverage probabilities of both the GCI and Bayesian credible interval exceeded the nominal confidence level of 0.95 in some cases.Figures 1 and 2 present the coverage probabilities and average lengths of the confidence intervals for the percentile, corresponding to various Algorithm 4 Step 1: Specify the values of a 1 , a 2 , b 1 , b 2 , and r, then compute aðrÞ using Eq. ( 16) and compute b þ ðrÞ using Eq. ( 18) Step 2: At the i step (a) Generate u from uniform distribution with parameters 0 and aðrÞ, denoted as Uð0; aðrÞÞ (b) Generate v from uniform distribution with parameters 0 and b þ ðrÞ, denoted as Uð0; b þ ðrÞÞ (d) If the value of q is accepted, the set b ðiÞ ¼ q if u ½pðbjxÞ 1=ðrþ1Þ ; otherwise, repeat step (a)-step (c) (e) Generate k from inverse gamma distribution with parameters n 2 þ a 2 and 1 2 and compute a ðiÞ ¼ ffiffi ffi k p Step 3: Compute the posterior distribution of h, denoted as h Baye , using Eq. ( 19) Step 4: Repeat step 2 and step 3, a total M times and obtain an array of h Baye 's Step 5: Compute L h:HPD and U h:HPD sample sizes and shape parameters, respectively.According to Fig. 1, it can be observed that the coverage probabilities were close to the nominal confidence level of 0.95 as the sample size increased.Furthermore, the average lengths of all approaches decreased as the sample size increased.Based on the simulation results presented in Fig. 2, the coverage probabilities were close to the nominal confidence level of 0.95 as the shape parameter Algorithm 5 Step 1: Use Algorithm 1-Algorithm 4 to construct the confidence intervals Step 2: Step 3: Step 4: Repeat step 1-step 3, a total 5,000 times Step 5: Compute mean of p defined by the coverage probability Step 6: Compute mean of U h À L h defined by the average length increased.Additionally, the average lengths of all approaches increased with the shape parameter.

EMPIRICAL APPLICATION
The GCI, bootstrap, Bayesian, and HPD approaches can be employed to calculate confidence intervals for the percentile of PM2.5 levels in Mae Hong Son province and Lampang province, Thailand.The suitability of these models for fitting the daily PM2.5 level data was assessed using the Akaike Information Criterion (AIC).Table 3 displays the AIC values for these seven probability models, calculated based on the PM2.5 level data from both provinces.The results in Table 3 indicated that the Birnbaum-Saunders distribution is the most Table 3 The estimated AIC values for the probability models using the PM2.5 level data from Mae Hong Son and Lampang provinces.

Distributions Mae Hong Son province Lampang province
appropriate model for fitting the daily PM2.5 level data in both Mae Hong Son and Lampang provinces, as it yields the lowest AIC value.Table 4 reports the statistics of daily PM2.5 level data in Mae Hong Son and Lampang provinces.Table 5 displays 95% two-sided confidence intervals for the percentile of daily PM2.5 level data in these provinces, using the GCI, bootstrap, Bayesian, and HPD approaches.The results reveal that all confidence intervals encompass the true percentiles.Regarding interval length, the HPD approach yielded the shortest results for daily PM2.5 level data percentiles, while the GCI approach produced the longest results.However, in simulation, the lengths of the bootstrap, Bayesian, and HPD approaches were shorter than the GCI approach, but their coverage probabilities were below the nominal confidence level of 0.95.Additionally, the coverage probability and average length in the simulation were calculated using 5,000 random samples, whereas the length in the example was computed using a single sample.Consequently, the bootstrap, Bayesian, and HPD approaches are not recommended for constructing confidence interval for percentile.

DISCUSSION
In the field of environmental sciences and air quality, percentiles serve as a means to characterize the majority of PM2.5 levels.Utilizing the confidence interval for the percentile of PM2.5 levels enables an estimation of the predominant PM2.5 levels.As a result, leveraging the estimated percentile of PM2.5 levels can contribute to strategic planning for the reduction of air pollutants.Puggard, Niwitpong & Niwitpong (2022) introduced the HPD approach for constructing confidence intervals for the variance and difference of variances of Birnbaum-Saunders distributions.Nevertheless, in certain situations, percentiles may be more suitable than variance.Hence, the aim of this study was to estimate the percentiles of the Birnbaum-Saunders distribution.Confidence intervals for the percentile of Birnbaum-Saunders distribution were generated using the GCI, bootstrap, Bayesian, and HPD approaches.The GCI approach exhibited strong performance in constructing these intervals.All four approaches utilized simulation data to create these confidence intervals.The GCI approach leverages GPQs for interval construction, while the bootstrap approach relies on the sampling distribution.In contrast, the Bayesian and HPD approaches are rooted in prior distributions.The study results suggest that the GCI approach is the preferred method for constructing confidence intervals for the percentile of the Birnbaum-Saunders distribution.This conclusion is consistent with the findings of previous studies by Ye, Ma & Wang (2010), Thangjai, Niwitpong & Niwitpong (2018), and Thangjai & Niwitpong (2022).

CONCLUSION
The confidence intervals for the percentile of the Birnbaum-Saunders distribution were established using four approaches: GCI, bootstrap, Bayesian, and HPD approaches.According to simulation results, the average lengths of the bootstrap, Bayesian, and HPD approaches were shorter than the GCI approach, but their coverage probabilities fell below the nominal confidence level of 0.95.As a result, the bootstrap, Bayesian, and HPD approaches are not recommended for constructing confidence interval for percentile.The findings consistently highlight the GCI approach as the most reliable in terms of coverage probability.Therefore, the GCI approach is recommended for constructing confidence interval for percentile.

Following
be a bootstrap sample drawn from Birnbaum-Saunders distribution with â and b.Hence, âÃ and bÃ are acquired by utilizing B bootstrap samples.Suppose that bðâ; aÞ and bð b; bÞ are the bias estimators of â and b, respectively.According to Puggard, Niwitpong & Niwitpong (2022), the estimators for bðâ; aÞ and bð b; bÞ are obtained by bðâ; aÞ ¼ 1 MacKinnon & Smith (1998), the respective correct estimates for âÃ and bÃ are obtained by Algorithm 1

.
To sample random data points in AðrÞ, the random variables ðu; vÞ are generated from a uniform distribution over a one-dimensional bounded rectangle ½0; aðrÞ Â ½b À ðrÞ; b þ ðrÞ.Suppose that aðrÞ, b À ðrÞ, and b þ ðrÞ are given by aðrÞ ¼ sup & Park (2016) proposed that aðrÞ and b þ ðrÞ are finite and b À ðrÞ ¼ 0. The prospective variate b ¼ v

Figure 2 Figure 1
Figure 2 Comparison of the coverage probabilities and average lengths of the confidence intervals for the percentile according to shape parameters.(A) Coverage probability (B) Average lengths.Full-size  DOI: 10.7717/peerj.17019/fig-2

Table 1
The coverage probabilities and average lengths of 95% two-sided confidence intervals for the percentile of Birnbaum-Saunders distribution.Bold font means the confidence interval with coverage probability greater than or equal to 0.95 and the shortest average length.

Table 2
Daily PM2.5 levels data in Mae Hong Son and Lampang provinces.

Table 4
Sample statistics for the daily PM2.5 level data in Mae Hong Son and Lampang provinces.Table5The lower limit (L h ) and upper limit (U h ) of the 95% confidence intervals for the percentile of the daily PM2.5 level data in Mae Hong Son and Lampang provinces.